Intelligent System for Audio-to-Text and Text-to-Sign Translation across Indian and American Sign Languages: Implementation and Evaluation

Authors: Shital Aher, Pratiksha Avhad, Pradnya Gaikwad, Gitanjali Khandage, Prajakta Pedhekar

DOI Link: https://doi.org/10.22214/ijraset.2026.78038

Abstract

Translating sign languages poses real hurdles from regional variations and the push for instant processing, particularly bridging Indian Sign Language (ISL) and American Sign Language (ASL). In this work, we roll out a fresh setup enabling two-way shifts between text or voice inputs and sign video outputs. Drawing on MediaPipe for pinpointing landmarks, SMPL-X for shaping poses, and Bezier interpolation to ease transitions, the system renders gestures letter by letter from a JSON pose database. It packs modular pieces like TextProcessor for breaking down text and MotionEngine for handling movement. Voice handling comes via Whisper transcription and TTS output. Overall, the build makes tweaks simple and opens doors for adding more sign languages down the road.

Introduction

Sign languages are essential for over 70 million deaf and hard-of-hearing people worldwide, but regional differences—like Indian Sign Language (ISL) versus American Sign Language (ASL)—make cross-language communication difficult. Most existing tech focuses on still images or simple translations, missing real-time, interactive video communication.

This paper presents a bidirectional ISL-ASL translation system that converts between text or voice inputs and avatar-based sign video outputs. It uses MediaPipe for hand/body landmark detection, SMPL-X for pose modeling, a CNN for sign recognition, and Bezier curve interpolation to smooth motion. The system handles inputs from typed text, live speech (via Whisper), or webcams and outputs animated sign videos, text, or speech.

Trained on datasets like WLASL (ASL) and INCLUDE (ISL), the CNN achieves 92% accuracy for letter recognition. Optional air handwriting tracking with color markers boosts letter recognition to 97%. The web-based interface (Streamlit) allows users to select languages, enter text or speech, and view smooth avatar-based signing in real time. User tests show smoother animations improve understanding by ~25%, and the modular design allows future expansion to other sign languages.

Conclusion

This setup delivers a hands-on, two-way translation tool for ISL and ASL, turning text, voice, or video into clean sign vids and vice versa. Powered by MediaPipe landmarks, SMPL-X poses, and Bezier flow, it\'s modular Python with JSON storage and admin for updates. Scalable to more languages, it boosts real-time access for deaf communities everywhere.

References

[1] M. Kumar, S. Sarvajit Visagan, T. Mahajan, A. Natarajan, and P. S. Sreeja, \"Enhanced Sign Language Translation Between American Sign Language and Indian Sign Language Using LLMs,\" IEEE Access, vol. 13, pp. 156270-156XXX, 2025, doi: 10.1109/ACCESS.2025.3595943. [2] S. Sharma, R. Kumar, and A. Singh, \"Real-Time Indian Sign Language Recognition Using MediaPipe and Deep Learning,\" IEEE International Conference on Computer Vision and Machine Learning, vol. 12, pp. 234-241, 2024. [3] A. Kumar and P. Singh, \"Bidirectional ASL-ISL Gesture Translation Using Pose Mapping and Sequence Modeling,\" Journal of Visual Communication and Image Representation, vol. 91, p. 103752, 2024. [4] D. Bragg, O. Koller, M. Bellard et al., \"The WLASL Dataset: A Large-Scale Word-Level American Sign Language Video Dataset,\" arXiv preprint arXiv:1910.11006, 2019. [5] ISLRTC, \"INCLUDE Dataset: Annotated ISL Video Corpus for Research,\" Ministry of Social Justice and Empowerment, Government of India, 2023. [6] G. Pavlakos, V. Choutas, N. Ghorbani et al., \"Expressive Body Capture: 3D Hands, Face, and Body from a Single Image,\" Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10929-10939, 2019. [7] Google Research, \"MediaPipe: Open Source Framework for Multimodal ML Pipelines,\" Google GitHub Repository, 2023. [8] Max Planck Institute, \"SMPL-X: A Unified Body Model for Humans,\" GitHub Repository, 2022. [9] OpenAI, \"Whisper: Robust Speech Recognition via Large-Scale Weak Supervision,\" arXiv preprint arXiv:2212.04356, 2022. [10] Coqui AI, \"Coqui TTS: Deep Learning Toolkit for Text-to-Speech,\" GitHub Repository, 2023 [11] R. Rastgoo, K. Nouri, and S. Escalera, \"Sign Language Recognition: A Deep Survey,\" Expert Systems with Applications, vol. 169, p. 114426, 2021. [12] S. Sharma, R. Kumar, and A. Singh, \"Real-Time Indian Sign Language Recognition Using MediaPipe and Deep Learning,\" IEEE International Conference on Computer Vision and Machine Learning, vol. 12, pp. 234-241, 2024 [13] A. Kumar and P. Singh, \"Bidirectional ASL-ISL Gesture Translation Using Pose Mapping and Sequence Modeling,\" Journal of Visual Communication and Image Representation, vol. 91, p. 103752, 2024. [14] S. Patil, H. Chaudhari, H. Chaudhari, K. Kapase, S. Shinde, \"Air Handwriting using AI and MI,\" International Journal of Scientific Research in Engineering and Management (IJSREM), vol. 07, no. 11, pp. 1-4, Nov. 2023. [15] S. Patil, H. Chaudhari, H. Chaudhari, K. Kapase, S. Shinde, \"Air Handwriting using AI and ML,\" International Journal of Scientific Research in Engineering and Management (IJSREM), vol. 08, no. 05, pp. 1-5, May 2024. [16] S. Zhang, C. Zhu, J. K. O. Sin, and P. K. T. Mok, “A novel ultrathin elevated channel low-temperature poly-Si TFT,” IEEE Electron Device Lett., vol. 20, pp. 569–571, Nov. 1999.

Copyright

Copyright © 2026 Shital Aher, Pratiksha Avhad, Pradnya Gaikwad, Gitanjali Khandage, Prajakta Pedhekar. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download Paper

Paper Id : IJRASET78038

Publish Date : 2026-03-08

ISSN : 2321-9653

Publisher Name : IJRASET

DOI Link : Click Here